System and method for multiresolution scalable audio signal encoding

ABSTRACT

An audio signal analyzer and encoder is based on a model that considers audio signals to be composed of deterministic or sinusoidal components, transient components representing the onset of notes or other events in an audio signal, and stochastic components. Deterministic components are represented as a series of overlapping sinusoidal waveforms. To generate the deterministic components, the input signal is divided into a set of frequency bands by a multi-complementary filter bank. The frequency band signals are oversampled so as to suppress cross-band aliasing energy in each band. Each frequency band is analyzed and encoded as a set of spectral components using a windowing time frame whose length is inversely proportional to the frequency range in that band. Low frequency bands are encoded using longer time frames than higher frequency bands. Transient components are represented by parameters denoting sinusoidal shaped waveforms produced when the transient components are transformed into a real valued frequency domain waveform. Stochastic or noise components are represented as a series of spectral envelopes. The parameters representing the three signal components compose a stream of compressed encoded audio data that can be further compressed so as to meet a specified transmission bandwidth limit by the deleting the least significant bits of quantized parameter values, reducing the update rates of parameters, and/or deleting the parameters used to encode higher frequency bands until the bandwidth of the compressed audio data meets the bandwidth requirement. Signal quality degrades in a graduated manner with successive reductions in the transmitted data rate.

This application claims benefit of USC Provisional Appl. No. 60/035,576,filed Jan. 16, 1997.

The present invention relates generally to systems for analyzing,encoding and synthesizing audio signals, and also to systems fortransmitting compressed, encoded audio signals over variable bandwidthcommunication channels.

BACKGROUND OF THE INVENTION

It is a basic premise of audio signal encoding techniques that if onehas a perfect model of the instrument or device that is creating asound, then the amount of data required to encode the sound will be verysmall, resulting in very high data compression ratios. For instance, torecord a piano (or any other instrument) playing a single note, such asmiddle C, using full compact disk (CD) recording techniques (e.g.,44,100 samples per second, 16 bits per sample), results in a huge amountof information per second (e.g., 705.6 kbps or 88,200 bytes per second).However, if it is known that the sound being recorded emanates from apiano and both the sound analysis system that is recording the sound,and the receiving systems that will reproduce the recorded sound, haveperfect models of the piano, then the only data required will be thedata required to indicate the note being played (1 byte is more thansufficient to which of the 88 notes on a piano), and the note'samplitude (perhaps 1 additional byte), plus data sufficient to identifythe beginning and ending of the playing of that note. (This isequivalent to the data on a printed page of music.) In a simple datarecording system using a piano model, data identifying the piano notebeing played can be recorded once every sample period, where a typicalsample period would be 10 or 20 milliseconds, resulting a data recordingrate of 100 to 200 bytes per second. Obviously a data rate of 200 bytesper second represents a great deal of data compression from the full88,200 bytes per second rate, and in fact indicates a compression ratioof 441 to 1. In more realistic, real world audio analysis and recordingsystems, compression ratios of 10 to 1 or so are generally considered tobe very good.

As presented in U.S. Pat. No. 5,029,509, the use of sinusoidal modelingfor speech and audio signals is well established. In audio signalanalysis and recording systems using sinusoidal modeling, an audiosignal is analyzed each sample period to determine the sinusoidal signalcomponents of the signal during that sample period. For example, thesinusoidal components will often be a fundamental frequency componentand a set of harmonics. Any portion of the signal not easily representedas sinusoidal components is typically represented as stochastic noisethrough the use of noise envelope parameters.

However, actual applications of sinusoidal modeling have been generallylimited to single-speaker speech and single-instrument (monophonic)audio. More recently, there have been various attempts to performsinusoidal modeling on wideband, polyphonic (or multisource) audiosignals for the purposes of data compression. The present inventionprovides an improved audio signal analysis and representation methodthat provides significant benefits and better compression than the priorsystems known to the inventors.

In traditional sinusoidal analysis methods, the input audio signal isfirst broken into uniformly sized segments (e.g., 5 to 50 millisecondsegments), and then processed through one or several fast Fouriertransforms (FFT) to determine the primary frequency components of thesignal being processed. The process of breaking the input sound intosegments is referred to in the literature as "windowing", or multiplyingthe input digital audio with a finite-length window function. Once thespectral peaks have been identified, parameters (such as frequency,amplitude, and phase) for each spectral component are determined,quantized and then stored or transmitted. This method works well if theinput is a monophonic source, and the traditional analysis methods candetermine what the single fundamental frequency happens to be.

In the case of general audio signal compression, there can be any numberof audio sources (polyphonic) and thus multiple fundamental pitches. Itis well known that the traditional methods of windowing and frequencycomponent identification give poor results on wideband audio signals.

The present invention is premised on the theory that the aforementionedpoor results are caused primarily by two problems: 1) a fundamentaltradeoff between time resolution and frequency resolution, and 2)failure to accurately model the onset of each note or other audio event.The present invention also addresses the failure of prior art systems toprovide graceful degredation of signal quality as the data transmisisonbandwidth is gradually decreased and/or as an increasing fraction of thetransmitted data is lost during transmission.

The tradeoff between time resolution and frequency resolution manifestsitself in the following scenario. If signal analysis procedure isdesigned to have very good pitch resolution, say, ±5 Hz, which may benecessary for resolving bass notes, then the corresponding window willhave to be about 200 milliseconds long. As a result, the analysisprocedure will have very good pitch resolution, but the time resolution(i.e., the determination of the temporal onset and termination of eachfequency component) will be very poor. Any time a partial begins (a newfrequency track), its attack will be smeared across the entire window of200 milliseconds. This makes the attack dull, and gives rise to aproblem called "pre-echo". When a receiving system synthesizes an audiosignal based on the audio parameters generated while using wide windows,synthesized coding error noise (like smeared partial attacks) is heardbefore the actual attack begins, this is known as "pre-echo".

Another problem associated with prior art audio data encoders is thatthe compressed audio data produced by those encoders is not easilyscaled down to lower data rates. Most high-quality wideband audioalgorithms in use as of the end of 1996 (such as MPEG and AC-3) useperceptual transform coders. In these systems the digital audio isbroken into frames (usually 5 to 50 milliseconds long), each frame isconverted into spectral coefficients using a time-domain aliasingcancellation filter bank, and then the spectral coefficients arequantized according to a psychoacoustic model. The most recent versionof these "transform-based" audio coders, known as MPEG2-AAC, can havevery good compression results. A CD-quality sound signal having 44100samples per second and 16 bits per sample, having 22 kHz bandwidth and adata rate of 705.6 kbps is compressed to a signal having a data rate ofabout 64 kbps/sec, which represents a compression ratio of 11:1.

While 11:1 is a very good compression ratio, transform coders have theirlimitations. First of all, if the available transmission data rate(i.e., between a server system on which the compressed audio data isstored and a client decoder system) drops below 64 kbps, the soundquality decreases dramatically. In order to compensate for this loss ofquality, the original audio input must be band limited in order toreduce the data rate of the compressed signal. For example, instead ofcompressing all audible frequencies from 0-20000 Hz, the encoding systemmay need to lowpass filter any frequencies above 5500 Hz in order tocompress the audio to fit in a 28.8 kbps transmission channel, which isthe typical bandwidth available using the modems most frequently foundon desktop computers in 1997.

Another limitation of the transform encoders are that the encodingtechnique is not scalable. On a computer network like the Internet, theactual bandwidth available to a user with a 28.8 kbps modem is notguaranteed to be 28.8 kbps. Sometimes, maybe, the user will actuallyreceived 28.8 kbps, but the actual available bandwidth can easily dropat various times to 18 kbps, 6 kbps, or anywhere in between. If atransform coder compresses audio to generate encoded data having a datarate of 28.8 kbps, and the data rate suddenly drops to only 20 kbps, theaudio quality of the sounds produced by client decoder systems will notgracefully degrade. Rather, the transform coder will produce silence,noise bursts, or poor time-domain interpolation. Clearly, it would behighly desirable for the quality of the sounds synthesized by clientdecoders to degrade gracefully as the available bandwidth decreases andwhen random data packets are dropped or lost during transmission.Gracefully degradation means that the listener will not hear silence ornoise, but rather a gradual decrease in perceptual quality.

SUMMARY OF THE INVENTION

In order to enable a more accurate analysis of polyphonic (multisource)signals that avoids the pre-echo problem, the present invention uses amultiresolution approach to spectral modeling.

In summary, the present invention is a musical sound or other audiosignal analysis system that is based on a model that considers a soundto be composed of three types of elements: deterministic or sinusoidalcomponents, transient components representing the onset of notes orother events in an audio signal, and stochastic components. Thedeterministic components are represented as a series of overlappingsinusoidal waveforms. To generate the deterministic components, theinput signal is divided into a set of frequency bands by amulti-complementary filter bank 132. The frequency band signals areoversampled so as to suppress cross-band aliasing energy in each band.Each frequency band is analyzed and encoded as a set of spectralcomponents using a windowing time frame whose length is inverselyproportional to the frequency range in that band. Thus, low frequencybands are encoded using much longer windowing time frames than higherfrequency bands.

The transient components are represented by parameters denotingsinusoidal shaped waveforms produced when the transient components aretransformed into a real valued frequency domain waveform by anappropriate transform. The stochastic or noise component is representedas a series of spectral envelopes.

From the representation of audio signals by parameters representing theabove described three signal components, sounds can be synthesized that,in the absence of modifications, can behave as perceptual identities,that is, they are perceptually equal to the original sound. Furthermore,the compressed encoded audio data can be further compressed so as tomeet a specified transmission bandwidth limit by the deleting the leastsignificant bits of quantized parameter values, reducing the updaterates of parameters, and/or deleting the parameters used to encodehigher frequency bands until the bandwidth of the compressed audio datameets the bandwidth requirement. Due to the manner in which the audiosignal is encoded, signal quality degrades gracefully, in a graduatedmanner, with successive reductions in the transmitted data rate.

BRIEF DESCRIPTION OF THE DRAWINGS

Additional objects and features of the invention will be more readilyapparent from the following detailed description and appended claimswhen taken in conjunction with the drawings, in which:

FIGS. 1 and 2 are block diagrams of a polyphonic audio signal analysissystem.

FIG. 3 is a flow chart depicting operation of a portion of the audiosignal analysis system that performs transient signal analysis andsynthesis of a reconstructed transient signal waveform.

FIG. 4 depicts the format of a packet of compressed audio data.

FIGS. 5 and 6 are block diagrams of an audio signal synthesizer thatgenerates audio signals from parameters received from the audio signalanalysis system of FIGS. 1 and 2.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 shows a "signal flow" representation of an audio signal analyzerand encoding system 100, while FIG. 2 depicts a preferred computerhardware implementation of the same system. The primary purpose of theanalyzer/encoder system 100 is to generate a compressed data streamrepresentation of an input audio signal that efficiently represents thepsychoacoustically significant aspects of the input audio signal.Typically, the compressed audio data will be stored in computer storagedevices or media. The compressed audio data is delivered either on mediaor by various communication channels (such as the Internet) to variousclient decoder systems 200 (see FIGS. 5, 6). The compressed audio datais encoded by the analyzer/encoder system 100 in a way that facilitatesfurther compression of the audio data so as to meet any specifiedcommunication bandwidth limitation and to enable "graceful degradation"(also called gradual degradation) of the quality of the audio signalproduced by decoder systems 200 as the available communication bandwidthdecreases (i.e., the signal quality of the regenerated audio signal iscomensurate with the available bandwidth). The client decoder systems200 synthesize a regenerated audio signal from the received, compressedaudio data.

The server computer(s) used to communicate compressed audio data toclient decoder systems 200 may be different computers than theanalyzer/encoder computers 100 used to encode audio signals.

The audio signal analyzer/encoder system 100 preferably includes acentral processing unit (CPU) 102, a user interface 104, an audio outputdevice 108, a digital signal processor (DSP) subsystem 100, and memory112. Memory 112, which typically includes both random access memory andnon-volatile disk storage, stores an operating system 114, an audiosignal analysis control program 116, and audio signal data 130. The DSPsubsystem 110 includes a digital signal processor (DSP) 120 and a DSPmemory 122 for storing DSP programs, and compressed audio parameters124. The DSP programs will be described in more detail below.

The use of a DSP 120 is optional, especially in applications where theaudio signal analyzer system 100 does not need to analyze audio data inreal time. In an alternate embodiment, the analyzer system 100 simplyuses a single, reasonably powerful CPU, such as a 200 MHz Pentium Pro ora 200 MHz PowerPC microprocessor. In these alternate embodiments, allthe "DSP procedures" described below are procedures executed by the main(and only) CPU 102, and all the audio analysis and system controlprocedures are stored in a single, integrated, memory storage system112.

The analyzer/encoder system 100 receives an audio signal 130 on an inputline 131, which may be part of the user interface 104, or may be a datachannel from the system's main memory 112. For the purposes of thisexplanation, it is assumed that the input audio signal 130 is a sampleddigital signal, sampled at an appropriate data rate (e.g., 44,100samples per second). The input signal is first processed by anmulti-complementary filter bank 132 that splits the input audio signalinto several octave-band signals 136 on lines 138. More generally, theband signals 136 contain contiguous frequency range portions of theinput audio signal. A multi-complementary filter is used to guaranteethat no aliasing energy is present inside the octave-band signals onlines 136. A description of multi-complementary filters can be found inN. Fliege and U. Zolzer, "Multi-Complementary Filter Bank," ICASSP 1993,which is hereby incorporated by reference as background information.

The multi-complementary filter bank 132 has the same basic filterstructure as the pyramid coding filters used for image processing, withan additional lowpass filter in the middle to remove aliased components.In return for having no aliasing energy present, the signals areoversampled by a factor of two. Thus the multi-complementary filter bank132 used is not a critically sampled filter bank. That is, the bandsignals 136 generated by the filter 132 are not critically sampled. Theterm "critically sampled band data" means that the total amount of data(i.e., the number of data samples) is equal to the amount of data (i.e.,number of data samples) prior to its division into band data. In thepreferred embodiment of the present invention, the number of samples inthe band data is twice the number that would be used in criticallysampled band data. However, because the analysis system 100 does notquantize the octave band signals directly, but rather generatessinusoidal parameters from them, the oversampling is not a problem. Onceagain, it is noted that the reason for oversampling the data in eachband signal 136 is to suppress cross-band aliasing energy.

In a preferred embodiment, the input audio signal is preprocessed by thefilter bank 132 into six octave-band channels at a 44.1 kHz samplingrate. Each octave-band signal 136 has a different length analysis windowthat is used for generating a respective stream of spectral modelsynthesis (SMS) parameters 142. This allows bass notes to be correctlyanalyzed with high frequency precision (using long windows at lowfrequencies), but also reduces pre-echo problems with high-frequencyattacks like cymbals (good time resolution with short windows). The sixoctave bands used in the preferred embodiment, and the number ofsubsamples generated by the filter bank for each analysis window are asfollows:

                  TABLE 1    ______________________________________    Filter Bank Windows                                   subsamples         effective           window                                   generated                                           sampling         window   bandwidth  size  per window                                           rate  Fs =    band  samples!                   Hz!        ms!  period  44,100Hz!    ______________________________________    6     128     11000-22000                              2.9  128     Fs    5     256      5500-11000                              5.8  128     Fs/2    4     512     2750-5500  11.6  128     Fs/4    3    1024     1375-2750  23.2  128     Fs/8    2    2048      687-1375  46.4  128      Fs/16    1    4096      0-687     92.9  128      Fs/32    ______________________________________

The sampling rate in Table 1 refers to the rate of the data in the bandrelative to the rate of data in the original signal.

The subsamples generated by the filter bank 132 for each octave band arethen analyzed by a respective sinusoidal component identifier 140. In apreferred embodiment, the sinusoidal component identifier 140 isimplemented using a short time frame FFT. The FFT identifies spectralpeaks within each band signal 136, and produces a parameter tuplerepresenting the frequency, amplitude and phase of each identifiedspectral component. As shown in Table 1, the FFT analysis time frame isdifferent for each band 136. The time frame length for each band 136 isselected to maximize the accuracy of frequency component identificationwhile maintaining reasonably good accuracy on identifying the time atwhich each frequency component begins and ends.

The time accuracy for frequency component identification depends on (A)the window period, and (B) the hop size (i.e., the number of samples bywhich the FFT window is advanced for each subsequent frequency analysisof the band signal). If a hop size of 1:1 were used, indicating thateach band sample is analyzed by the FFT only once, then the timeaccuracy of each frequency component would be the same as the windowsize. In the preferred embodiment, a hop size of 4:1 is used for allchannels. In other words, for a channel having 128 samples per window,the FFT is advanced 32 samples for each successive spectral analysis ofthat band. In addition, the time accuracy of the frequency componentidentifications is one fourth the window time for each band signal 136.

The sinusoidal component parameters 142 produced by the FFT analysis(i.e., a parameter tuple representing the frequency, amplitude and phaseof each identified spectral component) for each respective band signal136 are components of a stream of parameters 144 generated by audiosignal analyzer 100.

The same sinusoidal component parameters 142 are also passed to asinusoid waveform synthesizer 146, which generates a "deterministic"signal 148 composed of a set of sinusoidal waveforms. Sinusoid waveformsynthesizer 146 may use a bank of (software implemented) oscillators, orinverse Fourier transforms, to generate the sinusoidal waveforms. Thedeterministic signal 148 represents the sinusoidal portion of the inputaudio signal. A signal subtracter 150 then subtracts the deterministicsignal 148 from the input audio signal 130 to generate a first residualsignal 152 on line 154.

In summary, the first portion of the audio signal analyzer extracts andparameterizes all periodic, sinusoidal, steady-state energy from theinput audio signal 130. By using a multiresolution windowingmethodology, the customary tradeoff between time resolution andfrequency resolution is avoided.

Transient Modeling

Despite the relatively good time accuracy of the parameters 142representing the deterministic portion of the input audio signal, andthe virtually complete elimination of the "pre-echo" problem, theinventors have found that a synthesized audio signal generated from thedeterministic signal parameters 142 is still much "mudier" than thesound quality generated by a music compact disk (CD). Of course, a musicCD has a tremendously higher data rate than the parameters 142 generatedusing the sinusoidal component analysis portion of the analyzer 100, soa difference in sound quality would be expected. However, the inventorshave determined that there is a way to analyze and encode a "transientsignal portion" of the residual signal 152 in such as way as tocompensate for the mudiness of the regenerated deterministic signal 148,while only modestly increasing the overall data rate of the parameterstream 144. The amount of data typically required to encode thetransient signal portion of the residual signal is typically one fifthto one half as much data as is required to encode the deterministicportion of the input audio signal.

In a preferred embodiment, the residual signal 152 on line 154 isprocessed by a transient component identifier 156 to extract suddenattacks or onsets (i.e., when an instrument first begins to play a note)in the input audio signal 130. These transients, or onsets, are notperiodic or steady-state in nature. Therefore, the present inventionuses a different parametric model to characterize them. From anotherviewpoint, the transients being encoded by the transient componentidentifier represent the difference between the "true sinusoidalportion," including note attacks, onsets and endings, of the input audiosignal, and the deterministic signal 148. By efficiently identifying andencoding these transitions, a much more accurate representation of thenon-stochastic portion of the input audio signal is produced.

To analyze and parameterize the transients in an input audio signal, thepresent invention exploits the duality of time and frequency. Thetransient analyzer 156 finds time domain transients by (A) mappingframes (also called time segments) of the original time domain signalinto the frequency domain, (B) determining the spectral peaks of theresulting frequency domain signal, and (C) generating SMS-like parametertuples (i.e., frequency, amplitude and phase) to represent theidentified spectral peaks. The resulting parameters can be used by adecoder system 200 (described below with reference to FIGS. 5 and 6) toaccurately regenerate the transient components of an audio signal.

More specifically, referring to FIG. 3, the transient signal componentidentifier 156 (which is preferably implemented as a set of dataanalysis procedures executed by the encoding system's CPU 102 or DSP120) first segments the residual signal 152 on line 154 and theregenerated deterministic signal 148 into a set of frames, herein calledtime segments, such as 1 second time segments (step 160). For each timesegment, a first average energy value is computed for the residualsignal 152 and a second average energy value is computed for thedeterministic signal 148, and both signals are normalized with respectto the their average energy levels for that time segment. Thus, the twonormalized signals each have, on average, equal normalized energylevels. Next, the normalized residual signal (for the time segment) isscanned for energy peaks. In a preferred embodiment, this peak detectionis performed by further segmenting the normalized residual anddeterministic signals into mini-segments (e.g., 2 or 3 milliseconds eachin duration), and then making the following determination for eachmini-segment i:

    If (NE(RS)i-NE(DS).sub.i >Δ) {then a residual energy peak is located in mini-segment i }

where NE(RS)_(i) represents the normalized energy of the residual signalfor mini-segment i, NE(DS)_(i) represents the normalized energy of thedeterminstic signal for mini-segment i, and Δ represents a normalizedthreshold value (typically a value between 0.01 and 1, such as 0.5).Once all the mini-segments with residual energy peaks have beenidentified, each such identified peak is converted into a pair offrequency values called a "frequency guideline" in accordance with theposition of the peak in the time segment.

To give an even more specific example, given an analysis/encoder system100 in which the input audio signal 130, deterministic signal 148 andthe residual signal 152 are each digital sampled signals with 44,100samples per second, the deterministic and residual signals are segmentedinto 1 second segments, each having 44,100 samples, and are eachnormalized with respect to their respective average energy levels forthe 1 second segment. Each time segment is then divided into 441mini-segments, each having 100 samples (representing about 2.2milliseconds of data). The normalized energy of the residual anddeterministic signals are then determined for each 100-samplemini-segment, and the threshold comparison is made to determine whichmini-segments represent residual energy peaks.

If, for example, the 2nd, 100th and 221st mini-segments are the oneswith residual energy peaks, the mapping of those peaks into frequencyguides works as follows. The three mini-segments with energy peaksrepresent the following data samples in the larger time segment:101-200, 9901-10000, and 22001-22100. These are each converted into"frequency guidelines" simply by dividing each data sample positionvalue by two and rounding down to the closest integer:

    Frequency Guidelines=50-100 Hz, 4950-5000 Hz, and 11000-11050 Hz.

Thus, residual energy peaks close to the beginning of a time segment aremapped to low frequencies and residual energy peaks closer to the end ofthe time segment are mapped to higher frequencies.

If no residual energy peaks are detected in a time segment (step 161),no transient signal parameters are generated for that time segment (step162). Otherwise, transient signal parameters are generated for the timesegment, using the above determined frequency guidelines, as follows(steps 163-167). The first step of this process (step 163) is totransform the data samples of the residual signal for the time segmentinto a real valued set of frequency domain values. The transform used inthe preferred embodiment is the Discrete Cosine Transform (DCT). Themapping performed by the time to frequency domain transformation causestransients in the time domain to become sinusoidal in the frequencydomain. Other transforms that could be used for this purpose include themodified DCT, the Discrete Sine Transform (DST), and modulated lappedtransforms.

When a DCT is performed on the 44,100 samples of the residual signaltime segment, the transform generates 44,100 real valued DCTcoefficients. In step 164, these DCT coefficients are treated as thoughthey were a time domain signal for the purpose of locating sinusoidalwaveforms in the DCT "signal." More particular, in step 164, the DCTcoefficients are analyzed using a short time FFT to detect sinusoidalwaveforms in the DCT signal. In a preferred embodiment, the FFT uses awindow size of 2048 samples, and a hop size of 2:1 (meaning that thereis a 50 percent overlap between successive windows analyzed by the FFT).For each of the FFT windows (44 such windows are used in the preferredembodiment for each time segment), all frequency peaks located betweenthe guideline frequencies are identified and identification tuples(e.g., indicating frequency, amplitude and phase) are generated as thetransient signal parameters. These 44 sets of identification tuplesrepresent the transient portion of the residual signal 152.

The transient signal parameters 158 are similar to the sinusoidcomponent parameters 142 used to represent the deterministic portion ofthe input signal, except that the transient signal parameters 158represent a frequency domain mapping of a time domain signal, whereasthe sinusoidal component parameters 142 represent the frequencycomponents of a time domain signal. Typically, the transient signalparameters 158 are a very sparse set of parameters and will have a lowerassociated data rate than the corresponding sinusoidal componentparameters 142.

As an example, if there were an ideal impulse in the first residualsignal 148, then the transient component identifier 156 would initiallytake perform a DCT of a frame of data that included the impulse. If theimpulse were at the beginning of the frame (in time), then the DCTcoefficients corresponding to the impulse would form a low frequencysinusoid waveform. If the impulse were at the end of the frame, then theDCT coefficients corresponding to the impulse would form a highfrequency sinusoid waveform. Sinusoidal modeling is performed on the DCTcoefficients. The FFT procedure used to analyze the DCT coefficientsdoes not "know" that it is processing DCT coefficients and nottime-domain data. If the FFT procedure locates a DCT-domain sinusoid, alow-bandwidth parametric representation of that sinusoid is generated.

In order to increase the effectiveness and efficiency of the transientsignal identification process, the procedure restricts the spectralpeaks of the frequency domain signal to those associated with residualenergy peaks detected in step 160. Since the DCT of a transient signalis a sinusoidal waveform, determining where transients occur in the timedomain enables the procedure to know, in advance, what range ofsinusoidal components will exist in the frequency domain signal. Thetracking of spectral peaks of the frequency domain signal is restrictedto these sinusoidal components. Of course, in alternate embodiments,steps 160-162 could be skipped, so as to not to restrict the frequencydomain tracking of transient signals.

Noise Modeling

To model and encode the stochastic, noise component of the input audiosignal 130, a transient component signal 170 corresponding to thetransient signal parameters 158 is generated by a transient signalsynthesizer 172 and subtracted from the first residual signal 152 by asignal subtracter 174 to generate a second residual signal 176 on line178. The transient signal synthesizer 172 generates the transientcomponent signal 170 by performing an inverse FFT on the transientsignal parameters (or by using a bank of oscillators) so as to generatea set of sinusoidal waveforms (FIG. 3, step 165), and performing aninverse DCT on those sinusoidal waveforms to synthesize a reconstructedtransient signal 170 for the relevant time segment (step 166). Thereconstructed transient signal is then subtracted from the firstresidual signal 152 to generate a second residual signal 176 (step 167).

The second residual signal 176 represents the stochastic portion of theinput audio signal after subtraction of the deterministic, sinusoidalcomponents and transient components represented by the sinusoidalcomponent parameters 142 and the transient component parameters 158. Ina preferred embodiment, this remaining, second residual signal 176 isanalyzed and encoded in the same manner as taught by U.S. Pat. No.5,029,509. Since the second residual signal 174 is typically a lowlevel, slowly varying "noise floor," it can be encoded by a noisecomponent encoder 180 in several different ways. For instance, thesecond residual signal can be encoded by the noise component encoder 180as a line segment approximation of the residual signal's spectralenvelope (i.e., by a set of magnitude values for a number of discretefrequency values). Alternately, the spectral envelope of the residualnoise signal 176 can be represented as a set of LPC (linear predictivecoding) coefficients, or an equivalent set of lattice filtercoefficients. Thus, the noise component encoder 180 typically operatesby performing a FFT spectral analysis of the residual noise signal 174,and then generating a set of values or coefficients 182 that representthe spectral envelope of the residual noise signal 174.

Quantization, Storage and Bandwidth Limited Transmission of CompressedAudio Data

The sinusoidal component parameters 142, transient component parameters158, and noise modeling parameters 182 together form a data stream 144representing the input audio signal. Prior to "permanent storage" of thedata stream 144, the parameters in this data stream are first quantizedby a parameter quantizer procedure 183 in accordance with apsychoacoustic model so as to reduce the number of data bits requiringstorage. In other words, more data bits are allocated to perceptuallyimportant parameters than less important parameters. In a preferredembodiment, groups of parameters within each octave band are quantizedas a group using a well known technique called vector quantization,where each quantized vector represents a set of several parameters. Forinstance, one vector might be used to represent the frequency andamplitude of the four strongest frequency components of a particularoctave band. Furthermore, the quantized vectors are organized in a treestructure such that if the N least significant bits of the vectorrepresentation are deleted (and replaced by a fixed value such as 0 bythe receiving decoder system), the resulting selected quantized vectorremains the best vector representation of the associated parameters forthe number of bits used to represent the vector. Vector quantization isvery efficient in contexts in which there are detectable time orfrequency patterns or correlations associated with various audio"voices" in the input audio signal. For instance, an instrument such asa person's voice or a cello will typically have a detectable pattern ofharmonics for each note that repeat from one time sample period to thenext.

In general, regardless of whether the generated parameters are quantizedin groups using vector quantization or parameters are quantizedindividually, or some combination thereof, the quantization for eachparameter or group of parameters is performed in such a way that thenumber of bits for each parameter or group can be reduced simply byeliminating a selected number of the least significant bits of thequantized parameter or group in accordance with any specified "datacompression level". Thus, a parameter that is quantized and encoded with6 bits of data will still have meaning and will be useable by a clientdecoder system if one or two (or even more) of its least significantbits are dropped in order to achieve a target data stream bandwidth.

The resulting quantized parameters are called the "compressed audioparameters" or the "compressed audio data," and these are typicallystored in a non-volatile storage device 184. More specifically, thequantized parameters are typically grouped into data packets 190 (seeFIG. 4) that are then stored in the storage device 184, where the datain each data packet 190 will be the data for one time frame, such as thewindow period associated with the lowest octave band (e.g., 92.9milliseconds). Referring to FIG. 4, each data packet 190 stored ondevice 184 will typically include:

a time sequence number 191 to indicate the time index associated withthe compressed audio data in the packet,

a four-bit compression level value 192, which is preferably initiallyset to zero for data packets when they are stored and which may be laterreset to a value associated with a lower transmission bit rate at thetime the packet is transmitted to a client decoder system;

a packet bit syntax 193, which indicates how the sinusoidal, transientand noise parameters have been encoded and quantized so that thereceiving system can decode the quantized data 194 in the packet; and

the quantized, compressed audio data 194.

The transient component parameters, which are computed on a 1 secondtime frame basis, and the noise component parameters, which are alsoupdated relatively slowly, are preferably distributed over the set ofdata packets representing a 1 second time frame (e.g., 11 data packets).

As indicated in FIG. 4, when a data packet of compressed audio data istransmitted, the corresponding transmission data packet 195 includes oneor more packet headers 196 required for routing the packet to one ormore destinations, and a data corruption detection value 197, which isusually a CRC value computed on the entire contents of the packet(possibly excluding the packet headers 196, which may include its own,separate CRC value). The packet headers 196 and CRC value 197 aretypically generated at the time each data packet is transmitted by theappropriate operating system data transmission protocol procedures.Furthermore, if a data packet representing one time frame would exceedthe maximum allowed packet size for a particular communication network,then that packet is segmented into a sequence of smaller packets thatsatisfy the network's packet size requirements.

Compressed Audio Data Distribution Server or Subsystem

In some contexts, the compressed audio data will be copied onto mediasuch as computer diskettes, CDs, or DVDs for distribution to variousserver computers or even client computers. Alternately, the encodercomputer system 100 can also be used an compressed audio datadistribution server. A compressed audio data distribution server (orsubsystem) 186 will generally include a storage device 184 that stores acopy of the compressed audio data for one or more "programs," atransceiver 187 (typically a network interface) for transmitting datapackets to client decoder systems and for receiving information from theclient systems about the available bandwidth between the server andclient, and a parameter parser and selecter 188.

In particular, in a preferred embodiment, the parameter parser andselecter 188 receives an available bandwidth value, either from theclient decoder system or any other source, and determines from theavailable bandwidth how much of the encoded audio data to transmit. Forexample, if the full, CD quality encoded audio data has an associateddata rate of approximately 64 kbps, and the available bandwidth is lessthan 64 kbps, the data to be transmitted is reduced in a sequence ofsteps until the remaining data meets the bandwidth requirement. In oneembodiment, there are 10 data compression levels, the first of which(compression level 0) represents the full set of stored encoded data.The successive data reductions associated with each of the other ninecompression levels is as follows:

                  TABLE 2    ______________________________________    Data Compression by Parameter Parsing and Selection    Compression    Level   Data Reduction    ______________________________________    1       Drop sinusoid parameters (and/or groups of parameters)            assigned the fewest number of bits in the current frame.    2       Update the noise signal only 10% as often as usual.    3       Band limit the signal by deleting parameters representing            the highest octave band.    4       Band limit the signal by cutting the update rate in half for            the second highest octave band.    5       Reduce number of bits used for remaining parameters by            deleting the N least significant bits of each parameter.    6       Delete half of the transient parameters (over the applicable            1 second frame).    7       Band limit by deleting parameters representing the second            highest octave band.    8       Delete remaining transient parameters and noise parameters.    9       Transmit only even numbered time frame packets (i.e.,            transmit only every other data packet).    ______________________________________

As indicated above the data reductions are applied cumulatively, andthus at compression level N all the data reductions associated withcompression levels 1 through N are applied. The compression levelparameter 192 in each transmitted data packet 195 is set to thecompression level used by the transmitting server system.

In an Internet audio data streaming application, two way communicationis available between the server (broadcaster of the audio data) and theclient decoder system (the listener or receiver). The server deliverscompressed audio at a data rate it believes the client can support undercurrent network conditions. If all goes well, the client can receive theexact bit rate the server is supplying with no packet dropouts. If thedata rate being transmitted is too high, then the client transmitsinformation back to the server indicating the data rate it can handle.An example of this scenario would be if the server believes the clientcan receive 20 kbps; but, the network is loaded down for a few minutesbecause of high traffic, and the client reports it can only receive 12.6kbps. The server then adapts, changes the compression level of thetransmitted audio data stream in real-time, and delivers an audio datastream having a data rate no greater than 12.6 kbps. Of course, if theclient can handle a higher data rate than the server is delivering, thenthe client can communicate that information to the server, and theserver will increase the data rate transmitted (and thus increase thequality as well).

Once the server decides which parameters to send and how many bits toallocate to those parameters, the selected data bits are formatted intoa bitstream, segmented into packets, and then transmitted to thereceiver via the Internet. In this manner, the server will deliver thebest quality of audio that the client can accept at any given time. Thecurrent representation will allow the server to transmit compressed dataat rates as high as 64 kbps (which is perceptually lossless) and as lowas 6 kbps (approximately telephone line quality) and almost any datarate in between. This feature of generating, in real time, data streamshaving a variety of different data rates from a single master encodedfile is not possible with transform based encoders such as MPEG andAC-3, which must encode (from the input audio signal) separate streamsfor use with various preselected channel bandwidths.

In addition, existing commercial systems must pause between switchingbit rates, and the pause is usually on the order of seconds. This is dueto the fact that such systems must always buffer enough packets to beable to reshuffle them into their correct order (in case they arereceived in the wrong order). In contrast, the present inventionrequires no delay or buffering or silence when switching data rates. Thetransition is perceptually seamless, as different subsets of sinusoidalparameters from the master high-resolution file are transmitted.

As indicated above, if a packet happens to be lost in transmission, thenthe missing data can be estimated by interpolating in the sinusoidalparameter domain from values received in the data packets before andafter the lost packet. This method of interpolation results in themaintenance of relatively good sound quality despite the loss of entiredata packets.

Client Decoder and Synthesizer System

FIG. 5 shows a "signal flow" representation of an audio signal decodersystem 200, while FIG. 6 depicts a preferred computer hardwareimplementation of the same system. The primary purpose of the clientdecoder system 200 is to synthesize an audio signal from a received,compressed audio data stream. The client decoder system 200 may alsodetermine the available bandwidth of the communication channel between aserver and the client decoder system 200 and transmit that informationback to the server.

The client system 200 preferably includes a central processing unit(CPU) 202, a user interface 204, an audio output device 208, a datapacket transceiver 210 (typically a network interface), and memory 212.In the preferred embodiment, the CPU 202 is a 200 MHz Pentium, 200 MHzPentium Pro or 200 MHz PowerPC microprocessor, with sufficient dataprocessing capability to synthesize an audio signal from a set ofreceived compressed audio parameters in real time.

In a preferred embodiment, memory 212, which typically includes bothrandom access memory and non-volatile disk storage, can store:

an operating system 214;

an audio signal decoder control program 216;

a receiver buffer 218 for holding one to two seconds of compressed,encoded audio signal data 218;

a synthesized audio data buffer 220 that is typically used to hold twoor three time frames (e.g., about 186 to 279 milliseconds) ofsynthesized audio data samples ready for playing by the audio outputdevice 208;

a bandwidth availability analyzer procedure 222; and

a set of audio signal synthesizer procedures 224.

The set of audio signal synthesizer procedures 224 includes:

a parameter interpolator 226;

a sinusoid waveform synthesizer 146, which can be identical to thesinusoid waveform synthesizer 146 used in the analyzer/encoder system100;

a transient waveform synthesizer 154, which can be identical to thetransient waveform synthesizer 154 used in the analyzer/encoder system100;

a noise synthesizer 228; and

a waveform adder 230.

The client decoder system 200 receives packets of compressed audio datafrom a server system via the client system's transceiver 210. Thereceived packets are temporarily stored in a packet buffer 218.Typically, one to two seconds of audio data are stored in the packetbuffer 218. By using a packet buffer, small changes in the transmissionrate of data packets will not cause data starvation. The received datapackets are surveyed by a bandwidth availability analyzer 222 thatdetects the rate at which data is actually received from the server, andwhen that data rate is different from the rate at which the server issending data, it sends an informational packet back to the server toreport the actual available bandwidth.

The packets in the packet buffer are processed by an interpolator,decompression and inverse quantization procedure 226. If data packetshave been dropped, or if some model parameters have not been sent by theserver due to bandwidth limitations, interpolation is performed toregenerate the lost or unsent parameters. In addition, if some of theleast significant bits of the received parameters have been deleted bythe server due to bandwidth limitations, the deleted bits are replacedwith predefined bit values (e.g., zeros) so as to decompress thetransmitted model parameters. Finally, the quantization of the modelparameters is reversed so as to regenerate values that are equal to orclose to the originally generated model parameters (i.e., sinusoidalwaveform, transient waveform and stochastic component parameters).

In addition, some of the parameters, such as those for transientcomponents and stochastic components may be distributed across numerouspackets, and those distributed sets of parameters are reconstructed fromas many of the received packets as are needed.

The resulting reconstructed model parameters are then used by respectiveones of the three synthesizer procedures 154,172 and 228 to synthesizesinusoidal waveforms, transient waveforms and spectrally shapedstochastic noise waveforms. The resulting waveforms are combined by awaveform adder 230 to produce a synthesized audio signal, which istemporarily stored in a buffer 220 until it is ready for output by theaudio output device 208. As indicated above, the sinusoid waveformsynthesizer 154 and the transient waveform synthesizer 172 both operatein the same manner as was described above with respect to the serveranalyzer and encoder system 100. The spectrally shaped noise generator230 is preferably implemented as a lattice filter driven by a randomnumber generator, with the filter's lattice coefficients beingdetermined by the received audio data.

Time and Pitch Modifications

Using the audio signal parameters generated by the audio signal encoder100, it is relatively easy to make time and pitch modifications to thestored, encoded audio program. In order to stretch a segment of music intime without changing its pitch, a decoder/synthesizer simply changesthe spacing of the sinusoidal, transient and noise parameters in time.In order to change the pitch of a piece of music without altering itsspeed, only the sinusoidal (frequency) component parameters need to bealtered.

Time and pitch modifications are important for applications such asbrowsing through an audio program quickly while maintainintelligibility.

While the present invention has been described with reference to a fewspecific embodiments, the description is illustrative of the inventionand is not to be construed as limiting the invention. Variousmodifications may occur to those skilled in the art without departingfrom the true spirit and scope of the invention as defined by theappended claims.

What is claimed is:
 1. An audio signal encoder, comprising:means forfiltering a digitally sampled audio signal with a multi-complementaryfilter bank that splits the audio signal into a plurality of bandsignals, where the plurality of band signals contain contiguousfrequency range portions of the audio signal and wherein the bandsignals are oversampled so as to suppress cross-band aliasing energy ineach of the band signals; and means for analyzing each of the bandsignals, using for each respective band signal a respective windowingtime whose length is inversely proportional to the frequency range ofthe associated band signal, to identify spectral peaks within each bandsignal and to generate encoded parameters representing each of theidentified spectral peaks.
 2. The audio signal encoder of claim 1,further including:a sinusoidal signal synthesizer for generating a setof sinusoidal waveforms corresponding to the encoded parametersgenerated by the band signal analyzing means; a signal subtracter meansthat subtracts the set of sinusoidal waveforms from the audio signal soas to generate a residual signal; and a transient component analyzer foranalyzing and encoding transient signal components in the residualsignal with a set of transient component signal parameters.
 3. The audiosignal encoder of claim 2, the transient component analyzer including:atransform means for transforming frames of the residual signal into realvalued frequency domain frames; and an analyzer for identifying spectralpeaks in respective ones of the frequency domain frames and encoding theidentified spectral peaks so as to generate the set of transientcomponent signal parameters for the respective ones of the frequencydomain frames.
 4. The audio signal encoder of claim 3, furtherincluding:a transient signal synthesizer for generating a reconstructedtransient signal from the transient component signal parameters; asecond signal subtracter for subtracting the reconstructed transientsignal from the residual signal to generate a second residual signal;and a noise component encoder for generating a set of noise modelingparameters representing spectral components of the second residualsignal.
 5. The audio signal encoder of claim 4, further including:meansfor assembling a parameter stream from the encoded parametersrepresenting the identified spectral peaks in the band signals, thetransient component signal parameters and the noise modeling parameters;and means for reducing transmission bandwidth associated with theparameter stream by performing a subset of a predefined set of bandwidthreduction actions.
 6. The audio signal encoder of claim 5, wherein thepredefined set of bandwidth reduction actions includes a plurality ofactions selected from the set consisting of deleting from the parameterstream a subset of the encoded parameters representing the identifiedspectral peaks in the band signals, reducing how often the noisemodeling parameters are included in the parameter stream, deleting fromthe parameter stream all encoded parameters representing the identifiedspectral peaks a highest frequency one of the band signals, reducing howoften the encoded parameters are included in the parameter stream for asecond highest frequency one of the band signals, reducing how many bitsare used to represent the encoded parameters in the parameter stream,and deleting a subset of the transient component signal parameters. 7.The audio signal encoder of claim 2, further including:a transientsignal synthesizer for generating a reconstructed transient signal fromthe transient component signal parameters; a second signal subtracterfor subtracting the reconstructed transient signal from the residualsignal to generate a second residual signal; and a noise componentencoder for generating a set of noise modeling parameters representingspectral components of the second residual signal.
 8. The audio signalencoder of claim 7, further including:means for assembling a parameterstream from the encoded parameters representing the identified spectralpeaks in the band signals, the transient component signal parameters andthe noise modeling parameters; and means for reducing transmissionbandwidth associated with the parameter stream by performing a subset ofa predefined set of bandwidth reduction actions.
 9. The audio signalencoder of claim 8, wherein the predefined set of bandwidth reductionactions includes a plurality of actions selected from the set consistingof deleting from the parameter stream a subset of the encoded parametersrepresenting the identified spectral peaks in the band signals, reducinghow often the noise modeling parameters are included in the parameterstream, deleting from the parameter stream all encoded parametersrepresenting the identified spectral peaks in a highest frequency one ofthe band signals, reducing how often the encoded parameters are includedin the parameter stream for a second highest frequency one of the bandsignals, reducing how many data bits are used to represent the encodedparameters in the parameter stream, and deleting a subset of thetransient component signal parameters.
 10. A method of encoding an audiosignal, comprising:filtering a digitally sampled audio signal with amulti-complementary filter bank that splits the audio signal into aplurality of band signals, where the plurality of band signals containcontiguous frequency range portions of the audio signal and wherein theband signals are oversampled so as to suppress cross-band aliasingenergy in each of the band signals; and analyzing each of the bandsignals, using for each respective band signal a respective windowingtime whose length is inversely proportional to the frequency range ofthe associated band signal, to identify spectral peaks within each bandsignal and to generate encoded parameters representing each of theidentified spectral peaks.
 11. The method of claim 10, furtherincluding:generating a set of sinusoidal waveforms corresponding to theencoded parameters representing the identified spectral peaks;subtracting the set of sinusoidal waveforms from the audio signal so asto generate a residual signal; and analyzing and encoding transientsignal components in the residual signal with a set of transientcomponent signal parameters.
 12. The method of claim 11, the transientsignal component analyzing and encoding step including:transformingframes of the residual signal into real valued frequency domain frames;and identifying spectral peaks in respective ones of the frequencydomain frames and encoding the identified spectral peaks so as togenerate the set of transient component signal parameters for therespective ones of the frequency domain frames.
 13. The method of claim12, further including:generating a reconstructed transient signal fromthe transient component signal parameters; subtracting the reconstructedtransient signal from the residual signal to generate a second residualsignal; and generating a set of noise modeling parameters representingspectral components of the second residual signal.
 14. The method ofclaim 13, further including:assembling a parameter stream from theencoded parameters representing the identified spectral peaks in theband signals, the transient component signal parameters and the noisemodeling parameters; and reducing transmission bandwidth associated withthe parameter stream by performing a subset of a predefined set ofbandwidth reduction actions.
 15. The method of claim 14, wherein thepredefined set of bandwidth reduction actions includes a plurality ofactions selected from the set consisting of deleting from the parameterstream a subset of the encoded parameters representing the identifiedspectral peaks in the band signals, reducing how often the noisemodeling parameters are included in the parameter stream, deleting fromthe parameter stream all encoded parameters representing the identifiedspectral peaks a highest frequency one of the band signals, reducing howoften the encoded parameters are included in the parameter stream for asecond highest frequency one of the band signals, reducing how many bitsare used to represent the encoded parameters in the parameter stream,and deleting a subset of the transient component signal parameters. 16.The method of claim 11, further including:generating a reconstructedtransient signal from the transient component signal parameters;subtracting the reconstructed transient signal from the residual signalto generate a second residual signal; and generating a set of noisemodeling parameters representing spectral components of the secondresidual signal.
 17. The method of claim 16, furtherincluding:assembling a parameter stream from the encoded parametersrepresenting the identified spectral peaks in the band signals, thetransient component signal parameters and the noise modeling parameters;and reducing transmission bandwidth associated with the parameter streamby performing a subset of a predefined set of bandwidth reductionactions.
 18. The method of claim 17, wherein the predefined set ofbandwidth reduction actions includes a plurality of actions selectedfrom the set consisting of deleting from the parameter stream a subsetof the encoded parameters representing the identified spectral peaks inthe band signals, reducing how often the noise modeling parameters areincluded in the parameter stream, deleting from the parameter stream allencoded parameters representing the identified spectral peaks in ahighest frequency one of the band signals, reducing how often theencoded parameters are included in the parameter stream for a secondhighest frequency one of the band signals, reducing how many data bitsare used to represent the encoded parameters in the parameter stream,and deleting a subset of the transient component signal parameters. 19.A computer program product for use in conjunction with a computersystem, the computer program product comprising a computer readablestorage medium and a computer program mechanism embedded therein, thecomputer program mechanism comprising:instructions for filtering adigitally sampled audio signal with a multi-complementary filter bankthat splits the audio signal into a plurality of band signals, where theplurality of band signals contain contiguous frequency range portions ofthe audio signal and wherein the band signals are oversampled so as tosuppress cross-band aliasing energy in each of the band signals; andinstructions for analyzing each of the band signals, using for eachrespective band signal a respective windowing time whose length isinversely proportional to the frequency range of the associated bandsignal, to identify spectral peaks within each band signal and togenerate encoded parameters representing each of the identified spectralpeaks.
 20. The computer program product of claim 19 furtherincluding:instructions for generating a set of sinusoidal waveformscorresponding to the encoded parameters generated by the band signalanalyzing means; instructions that subtract the set of sinusoidalwaveforms from the audio signal so as to generate a residual signal; andinstructions for analyzing and encoding transient signal components inthe residual signal with a set of transient component signal parameters.21. The computer program product of claim 20, including:instructions fortransforming frames of the residual signal into real valued frequencydomain frames; and instructions for identifying spectral peaks inrespective ones of the frequency domain frames and encoding theidentified spectral peaks so as to generate the set of transientcomponent signal parameters for the respective ones of the frequencydomain frames.
 22. The computer program product of claim 21, furtherincluding:instructions for generating a reconstructed transient signalfrom the transient component signal parameters; instructions forsubtracting the reconstructed transient signal from the residual signalto generate a second residual signal; and noise encoding instructionsfor generating a set of noise modeling parameters representing spectralcomponents of the second residual signal.
 23. The audio signal encoderof claim 22, further including:instructions for assembling a parameterstream from the encoded parameters representing the identified spectralpeaks in the band signals, the transient component signal parameters andthe noise modeling parameters; and instructions for reducingtransmission bandwidth associated with the parameter stream byperforming a subset of a predefined set of bandwidth reduction actions.24. The audio signal encoder of claim 23, wherein the predefined set ofbandwidth reduction actions includes a plurality of actions selectedfrom the set consisting of deleting from the parameter stream a subsetof the encoded parameters representing the identified spectral peaks inthe band signals, reducing how often the noise modeling parameters areincluded in the parameter stream, deleting from the parameter stream allencoded parameters representing the identified spectral peaks a highestfrequency one of the band signals, reducing how often the encodedparameters are included in the parameter stream for a second highestfrequency one of the band signals, reducing how many bits are used torepresent the encoded parameters in the parameter stream, and deleting asubset of the transient component signal parameters.
 25. The audiosignal encoder of claim 20, further including:instructions forgenerating a reconstructed transient signal from the transient componentsignal parameters; instructions for subtracting the reconstructedtransient signal from the residual signal to generate a second residualsignal; and noise encoding instructions for generating a set of noisemodeling parameters representing spectral components of the secondresidual signal.
 26. The audio signal encoder of claim 25, furtherincluding:instructions for assembling a parameter stream from theencoded parameters representing the identified spectral peaks in theband signals, the transient component signal parameters and the noisemodeling parameters; and instructions for reducing transmissionbandwidth associated with the parameter stream by performing a subset ofa predefined set of bandwidth reduction actions.
 27. The audio signalencoder of claim 26, wherein the predefined set of bandwidth reductionactions includes a plurality of actions selected from the set consistingof deleting from the parameter stream a subset of the encoded parametersrepresenting the identified spectral peaks in the band signals, reducinghow often the noise modeling parameters are included in the parameterstream, deleting from the parameter stream all encoded parametersrepresenting the identified spectral peaks in a highest frequency one ofthe band signals, reducing how often the encoded parameters are includedin the parameter stream for a second highest frequency one of the bandsignals, reducing how many data bits are used to represent the encodedparameters in the parameter stream, and deleting a subset of thetransient component signal parameters.